Identifying Quick Starters: Towards an Integrated Framework for Efficient Predictions of Queue Waiting Times of Batch Parallel Jobs
نویسندگان
چکیده
Production parallel systems are space-shared and hence employ batch queues in which the jobs submitted to the systems are made to wait before execution. Thus, jobs submitted to parallel batch systems incur queue waiting times in addition to the execution times. Prediction of these queue waiting times is important to provide overall estimates to the users and can also help metaschedulers make scheduling decisions. Analyses of the job traces of supercomputers reveal that about 56 to 99% of the jobs incur queue waiting times of less than an hour. Hence, identifying these quick starters or jobs with short queue waiting times is essential for overall improvement on queue waiting time predictions. Existing strategies provide high overestimates of upper bounds of queue waiting times rendering the bounds less useful for jobs with short queue waiting times. In this work, we have developed an integrated framework that uses the job characteristics, and states of the queue and processor occupancy to identify and predict quick starters, and use the existing strategies to predict jobs with long queue waiting times. Our experiments with different production supercomputer job traces show that our prediction strategies can lead to correct identification of up to 20 times more quick starters and provide tighter bounds for these jobs, and thus result in up to 64% higher overall prediction accuracy than existing methods.
منابع مشابه
Qespera: an adaptive framework for prediction of queue waiting times in supercomputer systems
Production parallel systems are space-shared, and resource allocation on such systems is usually performed using a batch queue scheduler. Jobs submitted to the batch queue experience a variable delay before the requested resources are granted. Predicting this delay can assist users in planning experiment time-frames and choosing sites with less turnaround times and can also help meta-schedulers...
متن کاملPrediction of Queue Waiting Times for Metascheduling on Parallel Batch Systems
Prediction of queue waiting times of jobs submitted to production parallel batch systems is important to provide overall estimates to users and can also help meta-schedulers make scheduling decisions. In this work, we have developed a framework for predicting ranges of queue waiting times for jobs by employing multi-class classification of similar jobs in history. Our hierarchical prediction st...
متن کاملS : An Efficient Shared Scan Scheduler on MapReduce Framework
Hadoop, an open-source implementation of MapReduce, has been widely used for data-intensive computing. In order to improve performance, multiple jobs operating on a common data file can be processed as a batch to share the cost of scanning the file. However, in practice, jobs often do not arrive at the same time, and batching them means longer waiting time for jobs that arrive earlier. In this ...
متن کاملA Mathematical Analysis on Linkage of a Network of Queues with Two Machines in a Flow Shop including Transportation Time
This paper represents linkage network of queues consisting of biserial and parallel servers linked to a common server in series with a flowshop scheduling system consisting of two machines. The significant transportation time of the jobs from one machine to another is also considered. Further, the completion time of jobs/customers (waiting time + service time) in the queue network is the set...
متن کاملUsing Model-Based Clustering to Improve Predictions for Queueing Delay on Parallel Machines
Most space-sharing parallel computers presently operated by production high-performance computing centers use batch-queuing systems to manage processor allocation. In many cases, users wishing to use these batch-queued resources may choose among different queues (charging different amounts) potentially on a number of machines to which they have access. In such a situation, the amount of time a ...
متن کامل